Ecological Informatics — Latest Matching Preprints

1

Deep Blueprint: A Literature Review and Guide to Automated Image Classification for Ecologists

Game, C. A.; Piechaud, N.; Howell, K.

2025-11-04 ecology 10.1101/2025.11.03.686223 medRxiv

Top 0.1%

54.2%

Show abstract

O_LIDeep learning (DL) is a powerful tool to extract ecological information from large image datasets efficiently and consistently. However, applying these methods remains challenging, due in part to the complexity of DL workflows and the dynamic nature of available tools. C_LIO_LITo address this, we created a practical guide and review, focused on one of the fundamental tasks in automated image analysis: image classification. Our approach integrates commonly used software and highlights key steps - from image acquisition to annotated, model-ready datasets, to training, evaluation and deployment. It is modular and supported by a flexible code base (Python & R) and Graphical User Interfaces (GUIs), enabling adaptation to different models and ecological objectives. The goal is to empower ecologists to confidently incorporate computer vision into their research. C_LIO_LIWe illustrate this approach, using an open-source ROV dataset from the Norwegian Sea, featuring deep-sea biotopes defined by multivariate clusters of depth, substrate type, and associated species. To balance accessibility for users alongside performance, we focused on CNN models from the Ultralytics ML Platform (YOLO V.8 and V.11), comparing the full suite of architectures that range in complexity and efficiency. C_LIO_LICross-validation revealed high overall performances and that larger, more complex models are not always superior, with YOLO V.8m best (Accuracy = [~]0.98). Notably, high performances were achieved despite labels being based on both visual and external environmental predictors, suggesting visual features alone were sufficient for classification in this dataset. We highlight that the decision to deploy a model must be made in light of the studys objectives, with domain-based reasoning and experience guiding every stage of implementation. C_LIO_LIThis work offers a practical blueprint for implementing DL in ecological research, promoting broader adoption and supports reproducibility and more efficient, standardized, and sustainable monitoring; in this case of deep-sea biotopes, which is essential for marine spatial planning. C_LI

2

Interpretable and Robust Machine Learning for Exploring and Classifying Soundscape Data

Omprakash, A.; Balakrishnan, R.; Ewers, R. M.; Sethi, S. S.

2024-11-08 ecology 10.1101/2024.11.07.622465 medRxiv

Top 0.1%

53.2%

Show abstract

The adoption of machine learning in Passive Acoustic Monitoring (PAM) has improved prediction accuracy for tasks like species-specific call detection and habitat quality estimation. However, these models often lack interpretability, and PAM generates vast amounts of non-informative data, as soundscapes are typically information sparse. Here, we developed ecologically interpretable methods that accurately predict land use from audio while filtering unwanted data. Audio from habitats in Southern India (evergreen forests, deciduous forests, scrublands, grasslands) was collected and categorised by land use (reference, disturbed, and agriculture). We used Gaussian Mixture Models (GMMs) on top of a Convolutional Neural Network (CNN)-based feature extractor to predict land use. Thresholding based on likelihood values from GMMs improved model accuracy by excluding uninformative data, enabling our method to outperform models such as Random Forests and Support Vector Machines. By analysing areas of acoustic feature space driving predictions, we identified "keystone" soundscape elements for each land use, including both biotic and anthropogenic sources. Our approach provides a novel method for ecologically meaningful interpretation and exploration of large acoustic datasets independent of specific feature extractors. Our study paves the way for soundscape monitoring to deliver robust and trustworthy habitat assessments on scales that would not otherwise be possible.

3

A comparison of convolutional neural networks and few-shot learning in classifying long-tailed distributed tropical bird songs

Zhong, M.; LeBien, J.; Campos-Cerqueira, M.; Aide, T. M.; Miao, Z.; Dodhia, R.; Lavista Ferres, J.

2023-07-27 ecology 10.1101/2023.07.25.550590 medRxiv

Top 0.1%

46.1%

Show abstract

Biodiversity monitoring depends on reliable species identification, but it can often be difficult due to detectability or survey constraints, especially for rare and endangered species. Advances in bioacoustic monitoring and AI-assisted classification are improving our ability to carry out long-term studies, of a large proportion of the fauna, even in challenging environments, such as remote tropical rainforests. AI classifiers need training data, and this can be a challenge when working with tropical animal communities, which are characterized by high species richness but only a few common species and a long tail of rare species. Here we compare species identification results using two approaches: convolutional neural networks (CNN) and Siamese Neural Networks (SNN), a few-shot learning approach. The goal is to develop methodology that accurately identifies both common and rare species. To do this we collected more than 600 hours of audio recordings from Barro Colorado Island (BCI), Panama and we manually annotated calls from 101 bird species to create the training data set. More than 40% of the species had less than 100 annotated calls and some species had less than 10. The results showed that Siamese Networks outperformed the more widely used convolutional neural networks (CNN), especially when the number of annotated calls is low.

4

A multimodal learning approach for automated detection of wildlife trade on social media

Momeny, M.; Kulkarni, R.; Soriano-Redondo, A.; Rinne, J.; Di Minin, E.

2025-09-29 ecology 10.1101/2025.09.24.678024 medRxiv

Top 0.1%

39.6%

Show abstract

Social media data and machine learning methods for automated content analysis are increasingly being used in ecology and conservation science. A current limitation is the lack of methods for automated multimodal analysis of textual and visual content among other data modalities. In this study, we introduce a multimodal content analysis method applied to the investigation of wildlife trade on YouTube. Our approach consists of analyzing text through transformer based neural networks and video keyframes using convolutional neural networks as part of multimodal filtering followed by classification where a decision fusion module identifies instances of wildlife trade. The decision fusion module achieved an F-score of 0.72 among textual classifiers for trade detection and of 0.77 among visual classifiers for species identification. This multimodal classification helped detect wildlife trade in 3,715 out of 86,321 filtered YouTube posts, featuring 226 species for sale, including 51 Critically Endangered, 62 Endangered, 60 Vulnerable, 25 Near Threatened, and 28 Least Concern species. The proposed multimodal learning methods can be used more broadly for other ecological and biodiversity conservation applications. The bigger pictureThe unsustainable trade in wildlife is a major driver of biodiversity loss, threatening thousands of species across the Tree of Life. While online platforms have become popular spaces for advertising wildlife and exotic pets for sale, monitoring these platforms remains extremely challenging. Traditional surveillance methods are not scalable, and automated tools have typically focused on either text or image analysis in isolation, limiting their effectiveness in identifying nuanced instances of wildlife trade. Our study introduces a multimodal machine learning framework that integrates textual and visual data to detect potential wildlife trade on YouTube. By combining natural language processing with deep learning for image analysis, and filtering millions of posts down to those most relevant, our method significantly improves detection accuracy. This dual-layered approach uncovered thousands of posts featuring hundreds of species, many of which are threatened. This work demonstrates how advances in machine learning can support ecological monitoring and conservation by providing timely, data-driven, insights into online trade networks. In the pursuit of reducing biodiversity loss, this study offers an approach for bridging the gap between online behavior and real-world ecological outcomes. HighlightsO_LIIntroduces a multimodal content analysis approach for detecting wildlife trade on YouTube by integrating textual and visual data. C_LIO_LIA multimodal filtering technique reduces irrelevant text and video content, enhancing analytical efficiency. C_LIO_LIA decision fusion module then combines results from text and video filtering improving wildlife trade detection accuracy. C_LIO_LIThe proposed methods are applicable across multiple online platforms and suitable for diverse tasks in ecology and biodiversity conservation. C_LI

5

The Freshwater Sounds Archive

Greenhalgh, J. A.; Akmentins, M.; Boullhesen, M.; Brejao, G. L.; Bowman, J. C.; Briers, R. A.; Campbell, k.; Clark, A.; Coen, M.; Desjonqueres, C.; Gaston, S.; Gottesman, B. L.; Jones, I. T.; Lahoz-Monfort, J. J.; Lindsay, E.; Rodriguez, F. M.; Navarrete-Mier, F.; Norton, M.; Las Casas e Novaes, M. C.; Okazaki, S.; Polajnar, J.; Ribeiro, M. C.; Roberts, L.; Rothenberg, D.; Sabet, S. S.; Satish, R.; Spriel, B.; Stankovic, D.; Velde, K. t.; Timperley, J. H.; Turlington, K.; Walker, J. R.; Valverde, M. P.; Cox, K.; Looby, A.

2025-05-11 ecology 10.1101/2025.05.07.652412 medRxiv

Top 0.1%

39.5%

Show abstract

Freshwater ecosystems are full of underwater sounds produced by amphibians, aquatic arthropods, reptiles, plants, fishes, and methane bubbles escaping from the sediment. Although much headway has been made in recent years investigating the overall soundscapes of various freshwater ecosystems around the world, there remains a significant knowledge gap in our collective inability to accurately and reliably link recorded sounds with the species that produced them. Here, we present The Freshwater Sounds Archive, a new global initiative, which seeks to address this knowledge gap by collating species-specific freshwater sound recordings into a publicly available database. By means of metadata collection, we also present a snapshot of the species studied, the recording equipment, and recording parameters used by freshwater ecoacousticians globally. In total, 61 entries were submitted to the archive between the 4th of March 2023 and the 30th of April 2025, representing 16 countries and 6 continents. The most numerous taxonomic group was arthropods (29 entries), followed by fishes (14 entries), amphibians (10 entries), macrophytes (7 entries), and a freshwater mollusk (1 entry). The majority of the submissions were from European countries (27 entries), of which the United Kingdom was the most represented with 14 entries. The next most represented region was North America (11 entries), followed by South America (8 entries), Oceania and Asia (5 entries each), Africa (3 entries), and the Middle East and Central America with 1 entry each. The global south, polar regions, and areas with an elevation >500 m (asl) were underrepresented. The field of freshwater ecoacoustics to date has largely focused on the analysis of sound types due to a current lack of knowledge of species-specific sounds. The Freshwater Sounds Archive presents an opportunity to move beyond the sound type approach, and towards an approach with higher taxonomic resolution, ultimately resulting in species-specific descriptions. Furthermore, The Freshwater Sounds Archive will provide freshwater ecoacousticians with one of the main tools required to start creating annotated training datasets for machine learning models from soundscape recordings by referring to known species sounds present in the archive. In the long-term, this will result in the automatic detection and classification of species-specific freshwater sounds from soundscape recordings, such as indicator, invasive, and endangered species.

6

The application of neural networks to classify dolphin echolocation clicks

Seydi, V.; Chapuis, L.; Veneruso, G.; Balaguru, S.; Bristow, N.; Mills, D.; Le Vay, L.

2022-06-16 ecology 10.1101/2022.06.14.496047 medRxiv

Top 0.1%

36.6%

Show abstract

Passive acoustic monitoring (PAM) is a common approach to monitor marine mammal populations, for species of dolphins, porpoises and whales that use sound for navigation, feeding and communication. PAM produces large datasets which benefit from the application of machine learning algorithms to automatically detect and classify the vocalisations of these animals. We present a deep learning approach for the classification of dolphins echolocation clicks into two species groups in an environment with high background noise. We compare the use of Convolutional Neural Networks (CNN) and Recurrent Neural Network (RNN), in which we feed the models the raw waveform data and spectrograms. We show that both models perform well, with the highest performance achieved by a CNN fed with spectrograms (F1 score 97 %) and an RNN fed with raw data (F1 score 96%) fitted with Gated Recurrent Units (GRU). We recommend the use of such models to classify echolocation clicks in marine environments where background noise levels exhibit high spatial and temporal variance. In particular, the RNN showed excellent performance, while being fed with raw data, in terms of reduced processing time and storage. Deep learning automatically extracts effective features from the raw waveform in the training process through multiple layers of the model, without the need to rely on feature extraction in a separate pre-processing step.

7

Location Invariant Animal Recognition UsingMixed Source Datasets and Deep Learning

Shepley, A. J.; Falzon, D. G.; Meek, P.; Kwan, P.

2020-05-15 ecology 10.1101/2020.05.13.094896 medRxiv

Top 0.1%

34.9%

Show abstract

O_LIA time-consuming challenge faced by camera trap practitioners all over the world is the extraction of meaningful data from images to inform ecological management. The primary methods of image processing used by practitioners includes manual analysis and citizen science. An increasingly popular alternative is automated image classification software. However, most automated solutions are not sufficiently robust to be deployed on a large scale. Key challenges include limited access to images for each species and lack of location invariance when transferring models between sites. This prevents optimal use of ecological data and results in significant expenditure of time and resources to annotate and retrain deep learning models. C_LIO_LIIn this study, we aimed to (a) assess the value of publicly available non-iconic FlickR images in the training of deep learning models for camera trap object detection, (b) develop an out-of-the-box location invariant automated camera trap image processing solution for ecologist using deep transfer learning and (c) explore the use of small subsets of camera trap images in optimisation of a FlickR trained deep learning model for high precision ecological object detection. C_LIO_LIWe collected and annotated a dataset of images of "pigs" (Sus scrofa and Phacochoerus africanus) from the consumer image sharing website FlickR. These images were used to achieve transfer learning using a RetinaNet model in the task of object detection. We compared the performance of this model to the performance of models trained on combinations of camera trap images obtained from five different projects, each characterised by 5 different geographical regions. Furthermore, we explored optimisation of the FlickR model via infusion of small subsets of camera trap images to increase robustness in difficult images. C_LIO_LIIn most cases, the mean Average Precision (mAP) of the FlickR trained model when tested on out of sample camera trap sites (67.21-91.92%) was significantly higher than the mAP achieved by models trained on only one geographical location (4.42-90.8%) and rivalled the mAP of models trained on mixed camera trap datasets (68.96-92.75%). The infusion of camera trap images into the FlickR training further improved AP by 5.10-22.32% to 83.60-97.02%. C_LIO_LIEcology researchers can use FlickR images in the training of automated deep learning solutions for camera trap image processing to significantly reduce time and resource expenditure by allowing the development of location invariant, highly robust out-of-the-box solutions. This would allow AI technologies to be deployed on a large scale in ecological applications. C_LI

8

Acoustic Signatures of the Cerrado: Machine Learning Reveals Unique Soundscapes Across Diverse Phytogeographies

Daleffi da Silva, B.; Padovese, L. R.

2024-01-15 ecology 10.1101/2024.01.12.575467 medRxiv

Top 0.1%

34.6%

Show abstract

This article explores the application of machine learning techniques in acoustic ecology to classify the formations of the Brazilian Cerrado (Forest, Savanna, and Grassland) based on their soundscapes. Considering the Cerrados importance for biodiversity and hydrology, as well as the challenges faced by the biome in the face of agricultural expansion, the study seeks more efficient and economical methods for identifying its physiognomies. Five statistical models were developed and evaluated, using both traditional Machine Learning and Deep Learning, with the use of Mel-Frequency Cepstral Coefficients (MFCCs) and spectrogram images as input variables. The performance of these models was measured by accuracy, precision, and recall metrics, revealing a superiority of the Convolutional Neural Network (CNN), which, despite requiring greater computational cost and training time, provided high precision in the classifications and valuable insights through the application of the LIME explainability technique. Moreover, the study proposes a majority vote classification methodology for frequently observed events, enabling reliable classifications through models with moderate performance. It is concluded that the choice of the ideal model for the classification of soundscapes of the Cerrado should consider a balance between accuracy, operational complexity, and efficiency. The conclusions of this study offer relevant directions for future research and the application of monitoring technologies in conservation and recovery efforts of biomes.

9

Demystifying image-based machine learning: A practical guide to automated analysis of field imagery using modern machine learning tools

Belcher, B. T.; Bower, E. H.; Burford, B.; Celis, M. R.; Fahimipour, A. K.; Guevara, I. L.; Katija, K.; Khokhar, Z.; Manjunath, A.; Nelson, S.; Olivetti, S.; Orenstein, E.; Saleh, M. H.; Vaca, B.; Valladares, S.; Hein, S. A.; Hein, A. M.

2022-12-27 ecology 10.1101/2022.12.24.521836 medRxiv

Top 0.1%

34.5%

Show abstract

Image-based machine learning methods are quickly becoming among the most widely-used forms of data analysis across science, technology, and engineering. These methods are powerful because they can rapidly and automatically extract rich contextual and spatial information from images, a process that has historically required a large amount of manual labor. The potential of image-based machine learning methods to change how researchers study the ocean has been demonstrated through a diverse range of recent applications. However, despite their promise, machine learning tools are still under-exploited in many domains including species and environmental monitoring, biodiversity surveys, fisheries abundance and size estimation, rare event and species detection, the study of wild animal behavior, and citizen science. Our objective in this article is to provide an approachable, application-oriented guide to help researchers apply image-based machine learning methods effectively to their own research problems. Using a case study, we describe how to prepare data, train and deploy models, and avoid common pitfalls that can cause models to underperform. Importantly, we discuss how to diagnose problems that can cause poor model performance on new imagery to build robust tools that can vastly accelerate data acquisition in the marine realm. Code to perform our analyses is provided at https://github.com/heinsense2/AIO_CaseStudy

10

Deep learning assessment of cultural ecosystem services from social media images

Cardoso, A. S.; Renna, F.; Alcaraz-Segura, D.; Vaz, A. S.

2021-06-23 ecology 10.1101/2021.06.23.449176 medRxiv

Top 0.1%

32.9%

Show abstract

Crowdsourced social media data has become popular in the assessment of cultural ecosystem services (CES). Advances in deep learning show great potential for the timely assessment of CES at large scales. Here, we describe a procedure for automating the assessment of image elements pertaining to CES from social media. We focus on a binary (natural, human) and a multiclass (posing, species, nature, landscape, human activities, human structures) classification of those elements using two Convolutional Neural Networks (CNNs; VGG16 and ResNet152) with the weights from two large datasets - Places365 and ImageNet -, and our own dataset. We train those CNNs over Flickr and Wikiloc images from the Peneda-Geres region (Portugal) and evaluate their transferability to wider areas, using Sierra Nevada (Spain) as test. CNNs trained for Peneda-Geres performed well, with results for the binary classification (F1-score > 80%) exceeding those for the multiclass classification (> 60%). CNNs pre-trained with Places365 and ImageNet data performed significantly better than with our data. Model performance decreased when transferred to Sierra Nevada, but their performances were satisfactory (> 60%). The combination of manual annotations, freely available CNNs and pre-trained local datasets thereby show great relevance to support automated CES assessments from social media.

11

Forecasting the numbers of disease vectors with deep learning

Ceia-Hasse, A.; Sousa, C. A.; Gouveia, B. R.; Capinha, C.

2022-11-24 ecology 10.1101/2022.11.22.517519 medRxiv

Top 0.1%

31.1%

Show abstract

Arboviral diseases such as dengue, Zika, chikungunya or yellow fever are a worldwide concern. The abundance of vector species plays a key role in the emergence of outbreaks of these diseases, so forecasting these numbers is fundamental in preventive risk assessment. Here we describe and demonstrate a novel approach that uses state-of-the-art deep learning algorithms to forecast disease vector numbers. Unlike classical statistical and machine learning methods, deep learning models use time series data directly as predictors and identify the features that are most relevant from a predictive perspective. We demonstrate the application of this approach to predict temporal trends in the number of Aedes aegypti mosquito eggs across Madeira Island for the period 2013 to 2019. Specifically, we apply the deep learning models to predict whether, in the following week, the number of Ae. aegypti eggs will remain unchanged, or whether it will increase or decrease, considering different percentages of change. We obtained high predictive accuracy for all years considered (mean AUC = 0.92 {+/-} 0.05 sd). We also found that the preceding numbers of eggs is a highly informative predictor of future numbers. Linking our approach to disease transmission or importation models will contribute to operational, early warning systems of arboviral disease risk.

12

species2vec: A novel method for species representation

Angelov, B.

2019-12-22 ecology 10.1101/461996 medRxiv

Top 0.1%

30.8%

Show abstract

Word embeddings are omnipresent in Natural Language Processing (NLP) tasks. The same technology which defines words by their context can also define biological species. This study showcases this new method - species embedding (species2vec). By proximity sorting of 6761594 mammal observations from the whole world (2862 different species), we are able to create a training corpus for the skip-gram model. The resulting species embeddings are tested in an environmental classification task. The classifier performance confirms the utility of those embeddings in preserving the relationships between species, and also being representative of species consortia in an environment.

13

Occurrence cubes: a new paradigm for aggregating species occurrence data

Oldoni, D.; Groom, Q.; Adriaens, T.; Davis, A. J. S.; Reyserhove, L.; Strubbe, D.; Vanderhoeven, S.; Desmet, P.

2020-03-25 ecology 10.1101/2020.03.23.983601 medRxiv

Top 0.1%

28.3%

Show abstract

In this paper we describe a method of aggregating species occurrence data into what we coined "occurrence cubes". The aggregated data can be perceived as a cube with three dimensions - taxonomic, temporal and geographic - and takes into account the spatial uncertainty of each occurrence. The aggregation level of each of the three dimensions can be adapted to the scope. Built on Open Science principles, the method is easily automated and reproducible, and can be used for species trend indicators, maps and distribution models. We are using the method to aggregate species occurrence data for Europe per taxon, year and 1km2 European reference grid, to feed indicators and risk mapping/modelling for the Tracking Invasive Alien Species (TrIAS) project.

14

Zero-shot animal behavior classification with vision-language foundation models

Dussert, G.; Miele, V.; Van Reeth, C.; Delestrade, A.; Dray, S.; Chamaille-Jammes, S.

2024-07-07 ecology 10.1101/2024.04.05.588078 medRxiv

Top 0.1%

27.2%

Show abstract

1. Understanding the behavior of animals in their natural habitats is critical to ecology and conservation. Camera traps are a powerful tool to collect such data with minimal disturbance. They however produce very a large quantity of images, which can make human-based annotation cumbersome or even impossible. While automated species identification with artificial intelligence has made impressive progress, automatic classification of animal behaviors in camera trap images remains a developing field. 2. Here, we explore the potential of foundation models, specifically Vision Language Models (VLMs), to perform this task without the need to first train a model, which would require some level of human-based annotation. Using an original dataset of alpine fauna with behaviors annotated by participatory science, we investigate the zero-shot capabilities of different kind of recent VLMs to predict behaviors and estimate behavior-specific diel activity patterns in three ungulate species. 3. Our results show that using these methods, it is possible to achieve accuracies over 91% in behavior classification and produce activity patterns that closely align with those derived from participatory science data (overlap indexes between 84% and 90%). 4. These findings demonstrate the potential of foundation models and vision-language models in ecological research. Ecologists are encouraged to adopt these new methods and leverage their full capabilities to facilitate ecological studies.

15

Advancing Wildlife Image Analysis: A Graph Attention Contrastive Learning Approach for Region-Specific Mammal Classification

Kim, Y.; Kim, C.-H.; Yun, C.-S.; Joo, G.-J.

2025-09-18 ecology 10.1101/2025.09.17.676694 medRxiv

Top 0.1%

27.1%

Show abstract

1. Camera traps have become a cornerstone of wildlife ecological research, yet the manual analysis of the millions of images they generate requires substantial time and resources. Deep learning-based automation has emerged as a promising solution, existing global general-purpose models exhibit limitations in precisely recognizing local endemic species and adapting to unique local ecosystems. 2. This study developed a high-performance classification model optimized for native species. A large-scale "Korean Wildlife Dataset" was constructed from data collected across diverse domestic habitats, and a novel architecture was proposed to overcome limitations of conventional CNNs. The proposed Graph Attention Contrastive Learning (GACL) model is structured as a two-stage pipeline. Stage one employs YOLOv5 and MegaDetector to detect animals, humans, and vehicles, filtering valid images. Stage two performs fine-grained species classification. GACL captures structural relationships among object parts using a Graph Attention Transformer (GAT) and aligns semantic correspondence between images and textual descriptions via Parallel Contrastive Learning, enabling deeper understanding beyond simple visual features. 3. Evaluation on an independent test set demonstrated that the proposed model robust classification performance with an overall accuracy of 96.83% across four classes (Wildboar, Goral, Deers, and Other). Notably, in a comparative analysis against a global general-purpose model, our model showed distinct advantages in the precise recognition of endemic species. Furthermore, it exhibited a lower false positive rate in identifying animals in empty images, confirming its potential to enhance the efficiency of the data cleaning process. 4. Beyond technical accuracy, this study highlights that region-specific AI models that reflect local ecological characteristics can provide substantial practical value for wildlife monitoring and biodiversity conservation. Future work will require continuous efforts in data diversification and model lightweighting to further improve model robustness and practicality.

16

Assessing the quality of generative artificial intelligence for science communication in environmental research

Worden, D.; Richards, D.

2024-11-13 scientific communication and education 10.1101/2024.11.11.623072 medRxiv

Top 0.1%

27.0%

Show abstract

The adoption of Generative Artificial Intelligence (GenAI) tools is drastically changing the way that researchers work. While debate on the quality of GenAI outputs continues, there is optimism that GenAI may help human experts to address the most significant environmental challenges facing society. No previous research has quantitatively assessed the quality of GenAI outputs intended to inform environmental management decisions. Here we surveyed 98 environmental scientists and used their expertise to assess the quality of human and GenAI content relevant to their discipline. We analysed the quality and relative preference between human and GenAI content across three use cases in environmental science outreach and communication. Our results indicate that the GenAI content was generally deemed adequate in quality by human experts, with an average of 82% of respondents indicating a quality of "adequate" or better across the three use cases. Respondents exhibited strong preferences for GenAI over human-only content when using GenAI imageery of future park management scenarios. For the use cases of generating a wetland planting guide and answering a question about invasive species management, preferences were heterogeneous amongst respondents. Our findings raise substantive questions about GenAI content as a complement to human expertise when research is transferred to public audiences.

17

An algorithm for the identification of indicator taxonomic units and their use in analyses of ecosystem state

de la Vega, H.; Falco, L. B.; Saravia, L. A.; Sandler, R. V.; Duhour, A.; Velazco, V. N.; Coviella, C. E.

2022-05-18 ecology 10.1101/2022.05.16.492087 medRxiv

Top 0.1%

26.7%

Show abstract

Biological community structure can be used as an ecological state descriptor, and the sensitivity of some taxonomic groups or biological entities to environmental conditions allows for their use as ecological state indicators. This work describes a mathematical methodology developed for the identification of such taxonomic units when comparing environments or ecosystems under different anthropic impacts. Based on this methodology, a freely downloadable R package for easy use was developed, (Ecoindicators, DOI: 10.5281/zenodo.5772829). Based solely on presence or absence information, the method identifies indicator taxonomic units for each environment, estimates the belonging of any additional samples to a given environment, approximates the ecological niche of any taxonomic unit based on two or more selected environmental factors, and returns the frequency of any taxonomic unit in a selected combination of environmental factors. By using the approximation to the ecological niche of species present, given a new sample, the physicochemical parameters can be estimated by the species present in the sample. These analyses can be performed simultaneously for two or more taxonomic units. This paper provides a mathematical description of how the mathematical method was developed. One of the advantages of this method, and the referred R-package is that it can be used for any ecosystem for which there is a suitable biological dataset associated with environmental factors. In addition, both this mathematical procedure and the package referred to, can be tailored by other researchers to fit their own needs. We also expect other developers to further improve it.

18

One-Class Bioacoustic Detector for Monitoring the Critically Endangered Pied Tamarin (Saguinus bicolor)

Colonna, J. G.; Sobroza, T. V.; Gordo, M.; Nakamura, E. F.; Frery, A.

2025-10-13 ecology 10.1101/2025.10.11.681843 medRxiv

Top 0.1%

26.7%

Show abstract

The pied tamarin (Saguinus bicolor) is a critically endangered primate with a small geographic range that includes fragmented urban forest mosaics in Amazonia, where habitat subdivision and anthropogenic actions complicate its survival and monitoring. Passive acoustic monitoring (PAM) offers a convenient, noninvasive way to track this species, yet open-set rainforest soundscapes make single-species detection challenging. We present a machine-learning pipeline with a very low false-positive rate, appropriate for downstream inference. The method combines a band-pass filter (5 kHz to 10 kHz), Perch bioacoustic embeddings (deep learning), and a One-Class SVM (OCSVM) applied to sliding windows of continuous audio recordings to detect S. bicolor calls. We train on a reduced dataset of labeled calls and validate against diverse out-of-class audio (birds, anurans, anthropophony, and geophony/insects), then test on long, cross-site recordings. The approach achieves high discrimination on held-out negatives and produces very low false-positive rate in continuous, real-world audio, with a precision of 0.86. Finally, we pair detections with a single-site occupancy model in a cross-site setting to illustrate end-to-end utility for conservation monitoring and to estimate the false-negative detection probability in recordings from pied tamarin populations in a different geographic region. Our strategy provides a tool for PAM of S. bicolor that requires minimal manual labeling effort and can be adapted to other open-set, single-species monitoring scenarios. We grant reproducibility by releasing a Python package (sauim-detector), installable via pip, that processes an audio file and produces detection timestamps as an Audacity label file (.txt), enabling faster manual verification. HighlightsO_LIOpen-set bioacoustic detector for the pied tamarin. We introduce an open-set pipeline--band-pass (5 kHz to 10 kHz) [->] Perch embeddings [->] one-OCSVM--tailored to S. bicolor, filling a gap with no prior species-specific detector. C_LIO_LIBird-trained Perch embeddings transfer to primates. We show that Perch embeddings trained on birds generalize to S. bicolor vocalizations, enabling cross-taxon reuse without retraining. C_LIO_LIBand-pass filtering reduces background false positives and improves separability. On our evaluation sets, the 5 kHz to 10 kHz filter removed all background FPs and shifted ROC curves up/left, increasing AUC. C_LIO_LIRobust cross-site detection on long recordings with very few false positives. In an approximate 10 min Mindu Park test, the detector prioritized precision, with an observed false-positive rate of 0.03, and improved AUC from 0.74 (raw) to 0.83 (filtered). C_LIO_LIEnd-to-end monitoring via occupancy modeling. We couple detections with a single-site occupancy estimator and derive a closed-form MLE to estimate the cross-site false-detection probability. C_LIO_LISoftware package. We release the open-source Python package, which runs locally to apply our pipeline to an audio file and outputs Audacity-compatible detection labels plus a filtered audio file, streamlining manual verification and reducing effort on large PAM datasets (https://pypi.org/project/sauim-detector). C_LI

19

Integration of Deep-Learning and Species Distribution Models for Classification of Animal Species of the Brazilian Fauna

Oliveira, M. B.; Bernardino, H. S.; Vieira, A. B.; Barroso, A. A.; Augusto, D. A.

2026-05-08 ecology 10.64898/2026.05.06.723365 medRxiv

Top 0.1%

26.6%

Show abstract

The automated classification of animals from photos is important in ecology and conservation biology for organizing and understanding the immense diversity of species, as well as facilitating effective conservation and management practices. It is equally important for disease surveillance systems, allowing prompt detection of anomalies in species distributions and boosting citizen-scientist platforms by making user-reported data more accurate and convenient. Image classification uses photos and can also rely on the geographical locations of animals to improve performance. While image classification models have difficulties in classifying low-quality images, unbalanced datasets, and with a small number of images, species distribution models have difficulty in classifying species that coexist in a given region. We propose here strategies for combining image classification models based on deep neural networks with species distribution models using genetic algorithms. The proposal is applied to a real-world dataset comprising fifteen classes of animals from the Brazilian fauna obtained from Fiocruzs citizen-scientist Wildlife Health Information System (SISS-Geo). The SISS-Geo photos portray the reality of animals in their environments, with varying quality, and pose numerous difficulties for classification. Experimental results demonstrate that the proposed integration consistently outperforms standalone models. While individual SDMs achieve Top-1 accuracies of 27.79% (MaxEnt) and 31.76% (Bioclim), and CNN-based classifiers reach 58.17% with ResNet50 and 64.13% with ResNet-152, the hybrid strategies yield substantial improvements. The genetic algorithm-based integration with a single global weight achieves up to 67.96% Top-1 accuracy, whereas the class-specific integration using fifteen parameters attains the best overall performance, reaching 69.03%.

20

Quantitative evaluation of internal clustering validation indices using binary datesets

Pakgohar, N.; Lengyel, A.; Botta-Dukat, Z.

2023-08-12 ecology 10.1101/2023.08.09.552566 medRxiv

Top 0.1%

26.3%

Show abstract

Different clustering methods often classify the same dataset differently. Selecting the best clustering solution out of a multitude of alternatives is possible with cluster validation indices. The behavior of validity indices changes with the structure of the sample and the properties of the clustering algorithm. Unique properties of each index cause increasing or decreasing performance in some conditions. Due to the large variety of cluster validation indices, choosing the most suitable index concerning the dataset and clustering algorithms is challenging. We aim to assess different internal clustering validation indices. In the present paper, the validity indices consist of geometric and non-geometric methods. For this purpose, we applied simulated datasets with different noise levels. Each dataset was repeated 20 times. Three clustering algorithms with Jaccard dissimilarity are used, and 27 clustering validation indices are evaluated. The results provide a reliability guideline for the selection cluster validity indices.